Iterative Multiple Component Analysis with an Entropy-based Dissimilarity Measure
نویسنده
چکیده
In this paper we study the notion of entropy for a set of attributes of a table and propose a novel method to measure the dissimilarity of categorical data. Experiments show that our estimation method improves the accuracy if the popular unsupervised Self Organized Map (SOM), in comparison to Euclidean or Mahalanobis distance. The distance comparison is applied for clustering of multidimensional contingency tables. Two factors make our distance function attractive: first, the general framework which can be extended to other class of problems; second, we may normalize this measure in order to obtain a coefficient similar for instance to the Pearson’s coefficient of contingency. .
منابع مشابه
Iterative multiple component analysis with a Renyi entropy-based dissimilarity measure
In this paper, we study the notion of entropy for a set of attributes of a table and propose a novel method to measure the dissimilarity of categorical data. Experiments show that our estimation method improves the accuracy of the popular unsupervised Self Organized Map (SOM), in comparison to Euclidean or Mahalanobis distance. The distance comparison is applied for clustering of multidimension...
متن کاملIncremental entropy-based clustering on categorical data streams with concept drift
Clustering on categorical data streams is a relatively new field that has not received as much attention as static data and numerical data streams. One of the main difficulties in categorical data analysis is lacking in an appropriate way to define the similarity or dissimilarity measure on data. In this paper, we propose three dissimilarity measures: a point-cluster dissimilarity measure (base...
متن کاملOn Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering
Hybrid clustering combines partitional and hierarchical clustering for computational effectiveness and versatility in cluster shape. In such clustering, a dissimilarity measure plays a crucial role in the hierarchical merging. The dissimilarity measure has great impact on the final clustering, and data-independent properties are needed to choose the right dissimilarity measure for the problem a...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملEvaluation of Health System Development Plan and Basic Education Transformation Plan Based on Health System Assumptions with Emphasis on Education
Background and Objective: Health education and health promotion are considered an important source for economic, social and individual development. It is the governments’ important role to consider it as a crusial matter and all human beings need training to achieve this worthwhile goal, namely health. Methods: This study was carried out using content analysis “Shannon Entropy”. In this method ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007